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Abstract 

In this paper we discuss information-theoretic tools for obtaining optimized coarse-grained molec¬ 
ular models for both equilibrium and non-equilibrium molecular simulations. The latter are ubiq¬ 
uitous in physicochemical and biological applications, where they are typically associated with 
coupling mechanisms, multi-physics and/or boundary conditions. In general the non-equilibrium 
steady states are not known explicitly as they do not necessarily have a Gibbs structure. 

The presented approach can compare microscopic behavior of molecular systems to parametric 
and non-parametric coarse-grained models using the relative entropy between distributions on the 
path space and setting up a corresponding path-space variational inference problem. The methods 
can become entirely data-driven when the microscopic dynamics are replaced with corresponding 
correlated data in the form of time series. Furthermore, we present connections and generalizations 
of force matching methods in coarse-graining with path-space information methods. We demon¬ 
strate the enhanced transferability of information-based parameterizations to different observables, 
at a specihc thermodynamic point, due to information inequalities. 

We discuss methodological connections between information-based coarse-graining of molecular 
systems and variational inference methods primarily developed in the machine learning community. 
However, we note that the work presented here addresses variational inference for correlated time 
series due to the focus on dynamics. The applicability of the proposed methods is demonstrated 
on high-dimensional stochastic processes given by overdamped and driven Langevin dynamics of 
interacting particles. 
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1. Introduction 


Molecular dynamics simulations at microscopic (e.g. atomistic) level have capability of pro¬ 
viding quantitative information about rheological, mechanical, chemical and electrical properties 
of molecular systems, 48|, |21[. However, the enormous range of length and time scales involved 


in such complex materials presents a challenging computational task, in particular, due to a wide 
disparity of relaxation times. 

A standard methodology in order to overcome problems of long relaxation times of complex 
systems is to abandon the chemical detail and describe the molecular system by fewer (the most 
relevant) degrees of freedom. The choice of the latter depends entirely on the physical problem 
under question. Such particle-based, systematic coarse-grained (CG) models of molecular systems 
are developed by averaging out the details at the molecular level, and by representing groups 
of atoms by a single CG particle. Then the effective coarse-grained interaction potentials (more 
precisely free energies) are derived from the microscopic details of the atomistic model. The coarse¬ 
grained potentials and force helds can be derived through different methods, such as the inverse 


Boltzmann method, force matching and relative entropy, [^, [5j, [70|, l27|, [SS], [3J, [69j, [l^ . Applying 


these methods in the context of best-fit procedures in parametrized families of CG models the 
structural properties of systems at equilibrium can be described with accuracy which is related to 
a metric used for the parameter fitting procedure. However, the above mentioned coarse-graining 
parametrization techniques do not address dynamical properties of the model and are restricted to 
systems already at a (equilibrium) Gibbs state. 

Furthermore, there are several important issues related to systematic CG models using micro¬ 
scopic information for molecular systems under non-equilibrium conditions: (a) the whole approach 
is based on the fact that there is a direct connection between structural properties (like pair distri¬ 
bution functions) and CG interaction potentials; i.e., the renormalization group map or Boltzmann 


relation, see for instance [5,‘ll. 172. l27l|. This is certainly true at equilibrium and near to equilibrium 


but may not be the case for systems far from equilibrium; (b) since the CG interaction potential 
intrinsically involves entropy, it is not clear what is the dependence of the effective CG force field 
with respect to the external forces (if they are present); (c) predicting the dynamics (or incorporat- 
the proper friction in the equations of motion) in the CG non-equilibrium model is not clear. 


in; 


[ 23 , ■ All these aspects are, in principle, relevant in any application of a systematic CG model 

for a molecular system under non-equilibrium conditions. 

Recently, several methods for coarse-graining of stochastic models based on information theory 
have appeared in the literature, 0,i0. These methods employ entropy-based techniques that 
estimate discrepancy between (probability) measures. Using entropy-based analytical tools has 
proved essential for deriving rigorous results for passage from interacting particle models to mean- 
held description, e.g., 47|]. Applications of these methods to the error a naly sis of coarse-graining of 
stochastic particle systems have been introduced in [43, l39|, [dj, [dg, l40|, l41[ . Independently of such 
rigorous mathematical work, the engineering community developed entropy-based computational 
techniques that are used for constructing approximations of coarse-grained potentials for models 
of large biomolecules and polymeric systems (huids, melts), where the optimal parametrization of 
effective potentials is based on minimizing the relative entropy between equilibrium Gibbs states, 
e.g., 0,0,0,ii. Note, that other works in the literature are primarily based on observable¬ 
matching using either structural distribution functions, such as the inverse Boltzmann method. 

These 


inverse Monte Garlo methods, [7] 


52| |. or averaged forces on CG particles [34l. 


methods were used with a great success in coarse-graining of macromolecules, 


see, 


e.g., [72, 1581, 
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27, 28, 29, 25, [^. Recent review articles 25, 67, give a detailed overview of coarse-graining 
techniques applied to systems at equilibrium. Finally, effective coarse equilibrium dynamics for 
systems with temporal scale separation modeled by overdamped Langevin dynamics were studied 

in 0. 

Evolution of coarse-grained variables corresponding to Hamiltonian microscopic dynamics can 
be described exactly with the Mori-Zwanzing formalism leading to a stochastic integro-differential 


system with strong memory terms, known as the generalized Langevin equation (GLE) 75|, l57| . 
that is in principle computationally ^ractable. Therefore either a scaling of CG dynamics or 

0,0,0,0,0 EH, 0,0,0. 


approximations of the GLE are used 

Approximate dynamical models in a parametrized formulation have also been considered in 
recent studies, most of them based on the well established equilibrium parametrization methods 
described above. Eor example, authors in 3a, [lOj propose optimal GG parametrized Langevin 
dynamics based on the force matching method. 

In order to extend the information-theoretic approach developed in [0 for coarse-graining of 
Gibbs states to dynamics, a parameter fitting procedure for dynamic coarse-grained models was 


developed in 23|. The method proposed there is based on minimization of the relative entropy 


between discrete two-time transition probabilities associated with the diffusion process, in this case 
Langevin dynamics. Moreover, authors demonstrate that the relative entropy minimization can 
be interpreted as a force-matching problem. The use of the two-time step probability limits the 
applicability of the approach to short time dynamics while the discretization time step appears 
explicitly in the minimization problem leading to time step depended optimal parameters. In a 
recent article, 20(], authors attempt to overcome the short time limit using Bayesian inference to 
identify most probable parameters for a given time series of microscopic states, i.e. in a path 
space perspective.The authors provide also the connection with force-matching where though again 
the optimal parameter set depends on the time step of the numerical discretization scheme. The 
relative entropy rate (RER) functional for Markov Ghains proposed in 64] and [0 is similar to the 
functional that defines the best-fit optimization in [0, being the relative entropy per unit time for 
stationary processes. The formulation of path-space relative entropy for continuous time process 
in the present work illustrates that both path-space relative entropy and RER are independent 
of the time step for any numerical discretization scheme. This fact is further demonstrated in 


Appendix B where we study the RER minimization for numerical schemes for the Langevin and 


the overdamped Langevin dynamics. 

In fact, in the present article, as well as earlier in 4^, we show that path-space information 
metrics such as RER provide a general framework in several additional directions: (a) they are 
applicable to discrete-time or discretized (as in 0 ) dynamics and also to broad classes of con¬ 
tinuous time stochastic dynamics such as Kinetic Monte Garlo algorithms (e.g., reaction-diffusion 


mechanisms on lattices), biochemical reaction networks and semi-Markov processes, 6^, [4^, 65| 


(b) apply to non-equilibrium problems without analytically known Gibbs states and a detail bal¬ 
anced condition (irreversible processes), including driven Langevin models. Kinetic Monte Garlo 
algorithms and reaction networks, and most importantly, (c) our RER perspective shows that the 
corresponding optimization methods are extendable to both finite and infinite times, guarantee¬ 


ing quantified predictive capability even at long time regimes, 2^. In order to demonstrate the 


abstract principles of the information-theoretic framework on the path space we present coarse- 
graining of dynamics described as solutions to Ito’s stochastic differential equations. The presented 
information-theoretic methodology allows us to build optimal coarse-grained dynamics as system- 
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atic approximations of microscopic stochastic dynamics in the same general class of Markovian 
dynamics, e.g., of stochastic differential equations. The path-space information approach com¬ 
pares microscopic and coarse-grained dynamics using the relative entropy between distributions on 
the path space (see Theorem El]) and sets up the corresponding path-space variational inference 
(parameter optimization) problems. 

One of the mathematical and computational novelties of the presented approach lies in the 
derivation of path space force-matching conditions which are applicable to both equilibrium and 
non-equilibrium systems. This path-space information theory formulation provides a natural gen¬ 
eralization to non-e quil ibrium systems of the force matching methods developed earlier for systems 
at equilibrium, [^ . 37]. Moreover, we demonstrate here the equivalence of the relative entropy 
rate (RER) and force-matching type optimization methods, in analogy to the equilibrium case, 
studied in [60|, [671, [STj . Furthermore, due to information inequalities such as Q and d?]), we have 
that path information-based coarse-graining implies transferability of the parameterization to all 
reasonable observables by only training a single observable, at the specific thermodynamic point, 
namely the relative entropy. We explore a different but asymptotically equivalent (in the number 
of data) perspective, which is data-driven in the sense that it treats the microscopic simulator only 
as means of producing statistical data in the form of time series. 

We also stress in this work connections between information-based work for coarse-graining and 
the variational inference methods primarily developed in the machine learning community. 

In Section [2] we give a short overview of variational inference and coarse-graining as well as 
connections to machine learning methods and computational approaches. Next in Section E] we 
describe the microscopic and coarse-grained models and the corresponding dynamics. Section [D 
describes the path space relative entropy and relative entropy rate minimization problems and 
sets up the theoretical tools for comparing the microscopic and coarse dynamics at finite and 
long-time, stationary regimes. In Section [5] we prove that the minimization of the path-space 
relative entropy is a path space force matching problem providing a generalization of the known 
equilibrium force matching method to non-equilibrium systems. Applications of our results are 
presented in Section El for the Langevin dynamics demonstrating applicability of the proposed 
method to molecular systems while presenting specific examples of coarse graining transformations. 
Moreover, a direct study of the relative entropy rate minimization for the discretized Langevin and 
overdamped Langevin dynamics is discussed, verifying the validity of the continuous time optimal 
coarse-grained model for the corresponding time discretized schemes. In Section [7| we present the 
use of the relative entropy minimization as a means of optimizing the information content in a 
coarse model with respect to available time series data coming from a fine-scale simulation. Finally 
in the last section we summarize the contributions of the present work. 


2. Variational inference methods and coarse-graining of molecular systems. 

Here we give a short overview of information-based coarse-graining of molecular systems and 
highlight connections with variational inference methods primarily developed in the machine learn¬ 
ing literature and discuss computational approaches. 


2.1. Information-theoretic methods for coarse-graining molecular systems. 

Computational methods developed for parameterizin g c oarse-grained models at equilibrium, 

53 , 0 , 


such as inverse Monte Carlo, 

relative entropy, 0 . 12], provide a development of the CG interaction potential by considering 
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a pre-selected set of observables (/>*, i = and then minimizing a fitting functional over the 

parameter space 0 C Typical choices of observables (pi are radial distribution functions, 


and forces between CG particles, 34l. l33l. l68l| . A family of parametrization methods is described by 


mm 

6»e0 


i=l 




( 1 ) 


where /r denotes the fine-scale Gibbs equilibrium distribution and /i® the parametrized coarse¬ 
grained Gibbs distribution. Such methods are referred to as the iterative inverse Boltzmann method 
and the inverse Monte Garlo methods, 53, 52, 72, 0. Glearly any parametrization based on a 
minimization principle such as ([T]) depends on the specific choice of observables, while the accurate 
simulation of other observables, which are not part of the parameterization ([T]) is not necessarily 
guaranteed. 

Before we continue further we introduce mathematical concepts involved in coarse graining. In 
abstract terms we consider the original (microscopic) model defined on a measurable space (0,0), 
where 12 represents the state (configuration) space and B denotes the fj-algebra on fl, and the 
coarse-grained model on (12, B) with the coarse-graining map 


n : 12 ^ 12. 


( 2 ) 


The elements of the coarse state space Cl (the coarse degrees of freedom) are thus a) = nw. We use 
the bar notation for objects related to the coarse-grained model. The (probability) measures 
on the microscopic space {Cl,B) are mapped (pushed-forward) by the map 11*, P = n*P, 

Ep[^]= [_^d{U^P)= [ ^oUdP, (3) 

Ju Ju 

where ^ : f2 —>■ R, or equivalently (n*P)(i?) = P(n~^(B)) where B € B. 

The relative entropy (Kullback-Leibler divergence), [171], of two probability measures P{duj) and 
Q{duj) on a common measurable space {Cl,B) is given by 


7^(P|Q) = 


f 1 dP{uj) 
In dQ{uj) 


(4) 


provided P Q, i.e., P is absolutely continuous with respect to Q, and 71{P\Q) = -|-oo otherwise. 
The functional 71{P\Q) defines a pseudo-distance between two measures as Tl{P\Q) > 0 and 
TZ{P\Q) = 0 if and only if P = Q, P-a.s. In the case these probability measures have corresponding 
probability densities p{u}) and q{uj) relation dJ]) becomes TZ {P \ Q) = log p{uj)duj. 

In contrast to the observable-centered perspective of ([T|), we turn our attention to information 
inequalities and their implications for coarse-graining. For instance, the Gsiszar-Kullback-Pinsker 
(GKP) inequality, 17|, when applied to an observable (p and the fine-scale and coarse-grained 


equilibrium distributions considered in ([T]) readily gives rise to an error bound 


- < ll<(>l|ooY^27^(^*/r|/x®), (5) 

where 11 is the coarse-graining map, n*/r denotes the push-forward of the microscopic measure p 
defined above, and ||(/)||cx) = sup{|(/>(x)| : a; G 12} is the uniform norm of (p. 
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Since the microscopic and coarse-grained problems are formulated on different measure spaces it 
is necessary to clarify how the relative entropy is defined in ([5]). Note that from the computational 
point of view IZ (II*^ | is still computed on the microscopic space using the formula ([3]) for the 
push-forward. We also refer to SectionOfor precise formulas and related estimators. Another option 
for comparing the two models in the terms of relative entropy is by mapping the coarse model on the 
microscopic space, i.e., defining a (microscopic) reconstruction map. The reconstruction map can 
be viewed as a generalized inverse III ; —>• and as such is not necessarily unique. Associated 
with the reconstruction map is the pull-back map IltP that defines a measure on the microscopic 


space n. In that way we can also compare the models by considering TZ yP \ j ! ^ strategy we 
apply in Section [5l 

Returning to ([5]), as was pointed out in 46|] in the context of coarse-graining of stochastic lattice 
systems and Kinetic Monte Carlo algorithms, the CKP inequality provides a strong indication 
that the relative entropy can control all observables cj). In our context, the CKP inequality ([5]) 
demonstrates that minimizing the single observable given by the relative entropy TZ (n*/x | /2®), 
proposed in [ 13 , 0 , 0 , 0 , i.e., training the parametric coarse-grained model based on 


as 


min 7^ (11*// \ u^] , 
e&e V ^^ y 


( 6 ) 


instead of ([II), will provide reliable coarse-graining parameterizations, applicable to various ob¬ 
servables. Hence, due to (|5]), information-theoretic methods give rise to enhanced transferability 
properties of the resulting coarse-grained model with respect to other observables 0, t at a specific 
thermodynamic point. 


Remark 2.1. Very recently a sharper version of the CKP inequality has been established in 16l | 


Furthermore, in 2^, it is shown that such information inequalities can be extended to path-space 
observables, e.g., ergodic averages, correlations, etc. The error between averaging an observable 
under the probabilistic model described by P as compared to the model given by Q can be then 
bounded 


<b_(P, Q; 0) < Ep[0] - Eq[ 0] < cI>+(P, Q; 4>). (7) 

In [0 the authors refer to ^±{P,Q]cj)) as “goal-oriented divergence” because it has the properties 
of a divergence both in the sense of probabilities P and Q and observables cj)'. $+(P, Q;//)) > 0, 
(resp. $_(P, Q;())) < 0) and <b±(P, Q; 0) = 0 if and only \i P = Q a.s. or 0 is constant P-a.s. 
Furthermore, <I>±(P, Q; (/>) admits an expansion in the (small) relative entropy, [j^ ]: 


^±{P, Q; 4>) = ±V^avp[4>W2n {p\Q) + o{tz (p | Q )), 


that captures the key properties of this new divergence. Similar relations hold for path-s pac e 
observables, with the role of relative entropy played by the relative entropy rate (RER), 
dehned below in this section. 




2.2. Variational inference and machine learning. 

At this point we digress and point out connections between this and earlier information-based 
works for coarse-graining and variational inference methods. In the context of statistical variational 


inference (see for instance, 74l. l55l. l59l| 1 one defines a flexible and rich enough parametrized family of 


distributions, and one finds the member of the family which is closest to the posterior distribution 
in a suitable metric. Subsequently one samples from that approximating distribution instead of 
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the posterior itself. In this sense the inference problem is tackled using an optimization principle, 
hence the term “variational inference”. 

More precisely, if we denote the posterior by P and the parametrized family by over the 
parameter space 0 G 0 C we typically consider in variational inference their “distance” as the 
minimization over all corresponding relative entropies 

uimTlfplQ^) . ( 8 ) 

6»ee V / 


Reversing the order to Tl[Q^ \ P) can also be considered (as we also do in this paper), capturing 
different aspects of the posterior P due to the non-symmetry of the relative entropy, [59I . l55 |. In 
addition to the single parameter vector optimization in ([8]), we can also consider approximating 
Bayesian inference. This latter perspective gives rise to variational Bayesian inference, also known 
as ensemble learning, [^. 

In many important cases, e.g., when is a class of mean field models, that is a parametrized 
family of product distributions over the state space, the optimization problem ([8]) can be solved 
analytically, 74, 55, [s^. For instance, the mean field theory for the ferromagnetic Ising model 


can be also derived as the solution of ([8]), when is a family of product distributions, 55| 


Furthermore, more complex parametric families than mean-held have been also considered in 
the literature, see for instance 74i |. 

On the other hand, turning our attention towards the posterior P, an important class of models 
is exponential families, and in particular Gibbs distributions such as (m, also referred in the 


machine learning literature as Boltzmann machines . |5f 
yields an equivalent variational free energy, [51 


55l |. For such posteriors, the minimization 


m ([51) yieiOs an equivalent variational tree energy, [^j. Indeed this observation was also made in 
the coarse-graining literature and was the starting point of coar se-g rained parameterizations based 


69, 13, [^, 67|, and [i^. However, 


an 


on a similar principle to the variational inference in 1 

additional complexity to ([8]) arises in the coarse-graining case, where the coarse-graining map 11 in 
([^ enters in the optimization problem ([6]), 


min 77 ( n*P I Q 

0G0 ' 


(9) 


where is the parametrized coarse-grained model and P the microscopic Gibbs distribution. In 
addition to the variational inference point of view that allows for optimal parameterization of a 
coarse-grained model we can also use Q as means to compare different coarse-graining maps II)*). 
That is we can deploy ([9]) as an Information Criterion to assess and order their relative effectiveness, 
i.e., is a superior coarse-graining map to if and only if, 

minlZ (u^^'>P\Q^] < minlZ (uf'^P\Q^] . (10) 

dee V ' / dee V / 


The heuristic interpretation of (IlOp is that provides better information compression than 

. Authors in [2^ compare different coarse-graining maps (resolutions) based on the entropic 
component of the many body potential of mean force. 


2.3. Computational variational inference methods in coarse-graining. 

In equilibrium systems where the coarse-graining of Gibbs distributions P is considered, the 
coarse-grained family in ([9]) is not necessarily mean field so there is no available analytic solution 
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such as the ones discussed in In this case the minimization ([9]) is carried out using numerical 
optimization methods, 0,0. Although one can consider steepest descent and Newton-Raphson 
type methods for this optimization problem, the most efficient methods are stochastic optimization 
methods in the smrit of the Robbins-Monro algorithm, for the latter see 0 and references therein. 


In particular, in the authors propose an algorithm that is essentially a stochastic optimization 
version of the Newton-Raphson algorithm. This method improves the Robbins-Monro algorithm 
by employing a natural gradient time-stepping, 0. The natural gradient time-stepping arises as 
part of the Newton-Raphson scheme for ([9]), since the Hessian of the relative entropy is exactly 
the Fisher Information Matrix. A similar algorithm was also proposed in machine learning for 
stochastic variational inference, 3^. Finally, a Newton-Raphson method was introduced in 4^ 


for the minimization of path-space relative entropy for dynamics and non-equilibrium systems 
discussed in Section 4. There the role of the natural gradient is played by the path-space Fisher 
Information Matrix, which is a type of Fisher Information for dynamics introduced first in the 
context of sensitivity analysis in 641. 


3. Microscopic, coarse-grained and reconstrncted processes for molecnlar systems 

In this section we describe a prototypical molecular system at the microscopic scale and intro¬ 
duce its coarse graining as a configuration transformation that lumps together degrees of freedom. 
We define the underlying microscopic evolution in terms of a general stochastic differential equation 
and propose parametrized stochastic dynamics as a Markovian approximation of the coarse-grained 
process. We introduce a reconstruction of the coarse process defining a process that reintroduces 
the lost degrees of freedom and is approximating the microscopic evolution. The reconstructed 
process serves as an auxiliary process that connects the coarse-grained approximating dynamics 
with the microscopic dynamics on the same space. 


3.1. Microscopic dynamics 

We assume a prototypical system of N (classical) molecules in a box of the fixed volume V at 
the temperature T. Let q = {qi,... ,qN) £ describe the position vectors of the N particles 
in the microscopic description and p = {pi,... ,pn) G the momentum vectors. We denote by 
X = {q,p) € the joint vector of position and momentum. We consider the evolution of the N 
particles described by a diffusion process {Xt}t>o, a continuous time Markov process satisfying the 
stochastic differential equation (SDE) 


dXt = b{Xt)dt + a{Xt)dBt, t>0, 
-^0 ~ Mo , 


( 11 ) 


where b{x) G and cr(x) G R®-^^*^, k < 6N are the drift and diffusion coefficients and Bt 
denotes the standard fe-dimensional Brownian motion. The notation Xq ~ /tq means that the 
random variable Xq is distributed according to the probability fiQ. Throughout this paper we 
assume that the vector field b{x) and the diffusion coefficient held cr{x) are such that the system 
dill) has a unique solution for all t > 0, j6ll |. The general form dill) also accommodates the case 
of Hamiltonian equations of motions with the Langevin thermostat used for describing molecular 
systems at equilibrium in a thermal bath. 

In general a Langevin process does not necessarily possess a stationary distribution, or such 
distribution may not be explicitly known. This can be the case when non-conservative external 







forces appear or when the detailed balance condition fails. Note that for stationary dynamics the 
detailed balance condition and time-reversibility are equivalent. If the force F{q) is conservative 
and the fluctuation dissipation relation is satisfied, the Langevin dynamics are time irreversible up 
to momentum reversal, [50|], which guarantees that the Gibbs canonical measure is a stationary 
distribution 


H{dq) = Z exp{-/3U{q)}dq , 


( 12 ) 


where U{q) is the potential energy such that F{q) = —VU{q), Z = e~^^^'^')dq is the partition 

function and /3 = ks the Boltzmann constant. On the other hand for open systems, where 
the force F{q) has a non conservative part, ( 2 ^ . i.e. 


F{q) / -S/U{q) . 

a non equilibrium steady state (NESS) exists for which the condition of detailed balance fails. We 
note here that in machine learning and neural networks distributions such as (jl2p are called Boltz¬ 
mann machines and related reversible dynamics are considered for instance in stochastic Hopfield 
Models, 551. 


3.2. Coarse-grained and reconstructed dynamics 

Coarse-graining is considered as the application of a linear mapping (CG mapping) (see also 
([^ for a more general definition) 


n 


X 1-^ IIx G 


on the microscopic state space. For notational simplicity we set n = 6A^ and m = 6M. The mapping 
determines the M(< N) GG particles with the state x = Ilx as a function of the microscopic state 
X. Examples of GG maps commonly used for molecular systems include the mapping to the centers 
of mass of groups of atoms, the end-to-end vector of molecular chains, projections to a collection of 
atoms, see also Section [6.21 We call ‘particles’ and ‘CG particles’ the elements of the microscopic 
and coarse configuration space respectively. 

The proposed coarse space dynamics are described by a Markov process {Xt}t>o in approx¬ 
imating the process {nXt}t>o which is, in principle, non-Markovian. The Markov process {Xt}t>o 
is given as the solution of the parametrized stochastic differential equations 


dXt = b{Xt; 6)dt + d{Xt] e)dBt, t > 0, 


X( 


0 ~ Mo ; 


(13) 


where the drift 5(x; 9) G and diffusion a(x; 6) G , I < m, coefficients are parametrized with 
0 G 0. is an /-dimensional standard Brownian motion. As we have already indicated the goal 
of our study is to find the most effective among the proposed CG models such that {Xt}t>o “best 
approximates” the process that is to find optimal h{x\ 9) and d{x\ 9) in a parametric or 

non-parametric form, which is the subject of Section [5l 

We define a reconstructed process of a coarse process {Xt}t>Q in onto the microscopic space 
M"' to be any stochastic process {Xt}t>o which satisfies 


nxt = Xt 


t > 0 in distribution. 


(14) 
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A reconstructed process {Xt}t>o is obviously not unique, a trivial example is when 11 is not a 
one-to-one transformation though the non-uniqueness is not of a concern for our methodology. The 
path-space measures in [0, T] of the process we consider are denoted by 


P[o,r] for the microscopic process {Xt}t>o , 

QpT] for the coarse process {Xt}t>o and (15) 

Q^qt] ~ ntQ® for the reconstructed process {Xt}t>o , 

where the notation YI\Q^ is described in Section [2] as the reconstruction of the coarse path space 
measure onto the microscopic space. For the purpose of the present work we assume that 

the reconstructed process is the solution of the system 


dXt = b{Xt;0)dt + aiXt;0)dBt, t>0, 
Xq ^ Uq . 


(16) 


The coefficients b{x] 6) and a{x-, 0) must be such that the relation (fT^ is satished. Note that from 
the definition of the reconstructed process we have that for every observable of the form 


f{x) = g{Ilx ), 


i.e., a coarse observable, the expectations with respect to the probability of the reconstructed and 
the coarse process are identical 


E^[fiX^)]=E^-[g{X^)], 


(17) 


when Xq = x and Xq = IIx and for any stopping time r > 0. We denote the expectation with 
respect to the probability of {Xt}t>o, R{Xt^ e Fi ..., Xt^ e Fk) = P{Xf^ e Ti ■■■,X^^ G Fk), for 
any Fj,i = 1,... ,k subsets of M"', where Xf denotes that Xt starts at Xq = x. E^^ denotes the 
expectation with respect to the probability of {Xt}t>o starting at Xq = IIx. 

With the following theorem we give sufficient conditions that the b{x] 0) and o'(x; 0) must satisfy 
in order the relation (fTTl) to hold. A detailed study of reconstructed processes is presented in the 
work [i^, where explicit examples for stochastic lattice systems are presented. The proof of the 
theorem is given in [Appendix A[ and follows from the martingale uniqueness theorem, | 


6ii. 


Theorem 3.1. (Reconstruction) 

Let the processes {Xt}t>o in M™' and {Xt}t>o in be solutions of 11,^) and m respeetively, and 
n ; M" —>■ M™' a linear mapping. Assume that b{x]0) and a{x]0) are such that the existence and 
uniqueness of solutions to /fThj) is guaranteed. If 


Ub{x-,0) = b{Ux;0), (18) 

ut{x;e)n^^ = ii{nx;0) forallxGR'^, (19) 

where T,{x;0) = d'{x]0)a^^{x;0) and T,{x]0) = d{x;0)d^'^{x]9) and X denotes matrix transpose, 
then 

UXt = Xt, t >0 in distribution. 

In particular, the relation |77D holds for all coarse observables f{x) = g(nx). 
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Remark 3.1. (a) For a non-linear CG mapping 11, such as transformations to reaction coordinates 
Bli3, relations (I18|) and (11911 do not hold. The proof of Theorem 13.II demonstrates the direction 
how these relations can be generalized to non-linear mappings. 

(b) In the rest of our work we assume that the diffusion term a{x;9) is not parametrized i.e., 
d(x; 6) = a{x) and is such that 

s(n2:) = ns(x)n‘^, 

where ^(x) = a{x)a^'^{x) and ^{x\9) = a{x)a'^''{x). This requirement ensures that the diffusion 
coefficient of the reconstructed process {Xt}t>o, coincides with the one of the microscopic process 

a{x] 9) = (t{x) , (20) 

a condition that we need for the development of the theoretical tools in Section 14.11 We point out 
that the assumption (I20p is not necessary in the development of the variational inference problem 
for data driven systems; this will be further discussed in Section [71 


4. Variational inference for coarse-grained dynamics. 


In analogy to the previous discussion in Section [2] for the coarse-graining of Gibbs distributions, 
our approach to coarse-graining of dynamics can also be viewed as a variational inference method, 
however, this time set in the path space. We present an extension of the information-theoretic 
approach to systems with non-equilibrium steady states as well as dynamics in finite times. The 
presented method also allows for approximation of dynamical observables, i.e., quantities that are 
averaged over the path distribution instead of over a distribution at a terminal time. 

The relative entropy between two path measures P[o^t] Q[o^t] (see [4^ for a specific exam¬ 
ples) for the processes on the interval [0, T] is 


log ^ ^Po,T] 


log 


dP, 


[0,T] 


dQ[0,T] 


( 21 ) 


where is the likelihood ratio (or the Radon-Nikodym derivative) of T’[o,t] with respect to Q[o,r] 

and Epp^j[/] denotes averaging over the probability of paths Xfioj), t € [0,T], P[o,t]- Naturally, 
we have to assume that the measures P[o^t]^ Q[o,t] absolutely continuous, that is if for an event 

A holds Q[op^]{A) = 0 then P[o,T](^) = 0) log is F’[o,T]’iiif6g^9'ble. 

In the setting of coarse-graining or model-reduction the measure P[o,t] is associated with the 
exact process (mapped to the coarse space) and Q[o^t] is associated with the approximating coarse¬ 
grained process. In general the relative entropy (I2ip in this dynamic setting is not a suitable 
object for analyzing steady states or long-time behavior, however, in practically relevant cases of 
stationary Markov processes we can work with the relative entropy rate (RER) 


'R(P|Q)= hm l=n{P[o,T]\Q[o,T]) , 

1 —^oo 1 


( 22 ) 


where P and Q denote the corresponding stationary processes. The definitions of (1211) and 
do not depend on knowing the distributions or an underlying Gibbs distribution making them 
suitable for non-eguilibrium problems where even the steady states are not known analytically. We 


also refer to [6J, [73], [2^ where this feature is employed to develop sensitivity analysis methods for 


non-equilibrium systems. 
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In order to select the best approximation of the coarse-grained model, i.e., of the exact coarse¬ 
grained measure (5 [o,t]) we define a parametrized family of measures depending on parameters 

0 € 0. The best approximation is fitted using entropy based criteria in order to hnd the best 
Markovian approximation of the coarse-grained process. We consider the optimization principle 

mm7^ (P[o,r] I niOfo.r]) > (23) 


where T’[o,t] is the path distribution of the original microscopic process and 11 t] parametrized 
path-space coarse-grained distribution back-mapped to the microscopic space.Furthermore, for 
coarse-graining of stationary dynamics we consider the variational inference optimization problem 
based on the relative entropy rate (RER) instead of the full relative entropy 

mmH(PlntQ^). (24) 

We note that for dynamics we have essentially another Gibbs structure such as m, however, 
this time in the space-time. Precisely this structure is used to obtain the relative entropy rate 
calculations in Theorem liTl 


4-1- Path-space information methods for diffusion processes 


The purpose of this section is to compare continuous time Markov processes at the same state 
space given as solutions of stochastic differential equations. We provide explicit representations 
of the relative entropy (RE) and relative entropy rate (RER) for the path-space measures of the 
processes {Xt}t>o and {Xt}t>o in K”, solutions of the SDEs (fTT]l and (fT6]l respectively, in terms 
of their drift and diffusion coefficients, for (a) the finite time and (b) the stationary (steady-state) 
regime for molecular systems under non-equilibrium conditions. 

The key mathematical tool that allows us to express the RE TZ (^P[o,t] I Qp r] ) ’ RER 


H(P|Q®), defined in (1211) and (j22p respectively, in terms of the drift and diffusion coefficients 
appearing in (HH) and (HI]), is the Girsanov theorem, see [Appendix A.2i (^,[11]. Suppose that 
there exists a process {tt(As; 0)}s>o in such that 


a{Xs)uiXs-, 9) = b{Xs) - b{Xs; 9), and E 


\uiXs-,e)\^ds 


< oo , 


(25) 


where \u{x]9)\^ = Recall that a process {W}t>o is stationary when its joint distri¬ 

bution does not change with time. 

Theorem 4.1. Let {Xt}t>o, {W}t>o be Markov processes, solutions of <f77l) and ifihl) with path 
space distributions P[o,t] > r] ^ respectively, and Xq ~ yto, Xq ~ uq for which [EUl) holds. Suppose 
that there exists a process {u{Xt‘,9)}t>o, as defined in Then 

a) the relative entropy between T’[o,t] Q[ot]’ 


where 


(-R[o,t] I Q[q,t]) = (R[o,r] I Q[o,r]) + (/^o I z^o) , 


'H^{P[0,T] I Q\o,t]) = Epio.t] \ £ 0)1 


' ds 


(26) 

(27) 
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b) If furthermore {Xt}t>o and {Xt}t>o are stationary Markov processes, with fi{dx) the invariant 
measure for {Xt}t>o then 

^ (^[ 0 ,T] I Q[ 0 ,t]) = TniP \Q^) + n{p\ no) , (28) 

where I-L{P \ Q^) is the the relative entropy rate which is given by 


n{p\Q^) = Ef, 




(29) 


The proof of the theorem is given in Appendix A.3 Theorem 14.II a) gives a form of the relative 
entropy for any finite time interval [0, T] while Theorem 14.11 bl addresses the long time regimes 
providing a reduced form of (f26ll . Note that the relation (f26]l holds for any process {Xt}t>o, not 
necessarily stationary. Another important result that Theorem 14. II states is that relation ()29p holds 
for any initial condition Xq ~ no, where no is not necessarily an invariant measure. The two 
properties that ensure the linear in time increase of the RE and the existence of the RER are (a) 
the Markov property and (b) the stationarity. Thus the ratio of the path probability densities on 
[0, T] will be constant independent of the time interval length T. Markovianity though is not the 


most general condition, for example the semi-Markov property is also sufficient, 64l |. 


As we see from relation ()26p the relative entropy TZ (Pi 


[0,T] 


I 

\ H [o,r] 


is a quantity that increases 


in time, thus calculation of the RE over long time intervals may become unfeasible. Though for 
finite time intervals calculations may be tractable. This fact is depicted in the following corollary 
where we provide an explicit formula of the finite time component ()27l) along with one of RER (I29p 
in terms of the drift and diffusion coefficients. 

Corollary 4.1. (a) (RER representation for stationary processes) 


Let a{x) G 
exists u{x;6) G 


nxk 


X G 


appearing in m with rank{a{x)) = r = k < n. Suppose that there 


satisfying (25f). Then 

n{p\Q^) = E, 


-\\biX)-biX-i 


where E(x) = [a^^{x)a{x)] ^ cr^^{x) and || • ||s denotes the norm 


\z\\^ = zGE*" 


(30) 


(31) 


(b) (RE representation for finite time) 

^ (P[0,T] I Qfo,T]) = '^^(-P[ 0 ,T] I Q[o,T]) +'^iho\ ^o) 


where 


^^(-P[o,r] I Qfo,T]) = JEp[o,T] 


1 

-2 I 


-biXs-,e)\\lds 


(32) 


The result of Corollary 14.11 can be generalized to any cr(x) with rank(a(a:)) = r < fe < n if we 
use in place of E(x) a (Moore-Penrose) generalized inverse of cr{x), (b^. I3|. For completeness we 


provide the proof of Corollary I4.11 al in Appendix A.4 and note that the proof of (b) is similar. 
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Remark 4.1. Note that the representations ([301) and (1321) are valid when {Xt}t>o or {Xt}t>o are 
Ito processes. The first time we need that {Xt}t>o and {Xt}t>o are both (Ito) diffusions is when we 
want to use the stationarity of the processes to simplify the RE to RER such that we have u(x; 9) 
independent of time t. If b{x) is substituted by b{x,t) (in which case the microscopic process is 
non-Markovian), then we would have 


n 


iP[0,T] I Q[0,T]) = 


T] 


\\b{Xs,s)-b{Xs;e)\\lds 


Even if {Xt}t>o is stationary it is not obvious that b{Xs, s) — b{Xs; 9) is stationary, and the reduction 
to the RER needs to be checked casewise. 


5. Variational inference in coarse graining and path-space force matching 


The goal of this section is to obtain optimal parameters 9* of a coarse-grained dynamics model 
{Vt}t>o, eq. (fT^ . approximating the microscopic dynamics {Xt}t>o, eq. (fTT|) . based on the path- 
space variational inference described by (1231) and (1241) . We prove that minimization of the RE 
reduces to time dependent weighted least squares type problems with weights that depend on the 
CG mapping and the diffusion coefficient. Eor stationary regimes the time dependence is altered 
with the minimization of the RER. The weighted least squares formulation provides a natural 
generalization of the force matching methods developed for systems at equilibrium, 33|], to non¬ 
equilibrium systems. Moreover, the relation of RE and force matching type optimization methods 
is revealed, in analogy to the equilibrium case, studied in |60l. l67l. l37|. 


5.1. Properties of the CG mapping and the reconstructed process 

We recall that the coarse-graining map 11 : M"" —>• defined in Section [3] is a linear transfor¬ 
mation. We denote with the same letter II G the matrix representation of the linear map 11. 

Eurthermore, we assume that the CG map has full rank, 

rank(n) = m. 

We consider the reconstructed process {Xt}t>o as defined in Section 13.21 Theorem 13.11 and 
Remark 13. 11 i.e., such that 

116 ( 3 :; 9) = b{YVx] 9), and a{x\ 9) = (t{x) for all x G M"' . 

Thus the coarse diffusion coefficient (t{x) is independent of the parameters 9, and 

S(nx) = ns(x)n*’' for all x G M” . (33) 

The reconstructed drift term 6(x; 9) is a vector field in M” that can be written as 

6(x; 9) = n»6(nx; 9) + [in - n^n) y^(x), for all x G , (34) 

where In denotes the identity matrix in M”, lit* jg g, right inverse of 11, an n x m matrix such that 

nn# = im , (35) 
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and ?/'*‘(x) is any arbitrary vector in M” satisfying 11 y-*- = 0^. In principle, 11^ is not 

unique, though for any such 11^ it holds b = 116 which is the main property that we need. We 
choose n to have the full rank so we have an explicit form 11^, that is 

n# = n*’’ (nn*’’)“^. 

Most of the CG maps related to specific applications we consider are of a full rank, e.g., mapping to 
the centers of mass of groups of atoms, or projecting to fewer state space coordinates, see Section[6]2j 
As already mentioned, and relation (|34p verifies, the reconstructed process is not unique. Therefore 
one can always choose the term of the reconstructed drift (/„ — y*"(x) in (j34h independent 

of the parameter 6. In the rest of this work we assume that (/„ — y’‘(x) is independent of 

the parameter 9. 


5.2. Optimal coarse-grained dynamics 

Having set up the optimization problems (I23|) and (I24p . in this section we look for optimal 
solutions 6*{T) and 9* respectively, based on the first order optimality condition. That is if 9* is a 
solution of (l24|) then 

VeH{P\Q^*)={). (36) 

Thus solutions of (j36p reveal the local optima of the RER. Note that if the RER is a strictly convex 
function of 9 then there is a unique (global minimum) 9*. This property clearly depends on the 
choice of the parametrized model, i.e., through the definition of the parametrized drift 6(x;0), see 
for an example Remark 15.11 


Theorem 5.1. Let IT : M” —>■ he a linear mapping with rank{YVj = m. Consider the microscopic 

process {Xt}t>o and the coarse process {Xt}t>o satisfying <f77|) and [T3\} respectively, with the drift 
b{x), h{x;9) and the diffusion terms cr{x), a(x), such that (E^) holds, rank{a) = k and Xq ~ /tq: 
Xq ~ Lo- Let {Xt}t>o be a reconstructed process of {Xt}t>o with the drift b{x;9) defined in 
and the diffusion coefficient a(x). Then, for any 11^ satisfying 


a) 


argmin^geTe [P[o,T] I Q[o,t] ) = argminggeEpj 


[O.T] 


llnb(X,)-b(nXy,9)ll^„^ds 


where 






and E = {x)a{x)] ^a^^{x) 


b) If moreover {Xt}t>o is stationary with the invariant measure pL, then 


argmin0g0?^(P | Q^) 


argmin^geE^ 


||n6(A) - 6(nA; 


InttH 


We present the proof in Appendix A.5 


Theorem 0 proves that the variational inference problems (I23p and (I24p are force matching 
type problem with the norm H^UnttH instead of the usual Euclidean norm. Moreover, Theorem l5.1l a) 
is a time dependent force matching, i.e., force matching over paths, thus we may call the optimiza¬ 
tion problem the path-space force matching. Note that the calculations involved in the proof of 
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Theorem 15.11 are independent of the fluctuation-dissipation and detailed balance, thus they ap¬ 
ply to non-equilibrium systems directly. Theorem 15.11 b) shows that relative entropy rate (RER) 
minimization and FM are essentially identical. In particular formula ()28l) mathematically explain 
the difference between RE and FM at equilibrium, it reveals what each method is doing: the one 
minimizes and the other the relative entropy rate H{P\Q^). 

Remark 5.1. The existence of the unique global minimum is guaranteed if, for example, the force 
field b{x]9) depends linearly on 9. Let {(t>k{x)}k=i ^ m-valued polynomials on and 

approximate the force field h{x; 9) by 


K 

b{x]9) = '^9k4>k{x ). 

k 

In this case the minimization of RER is the least squares fit with respect to the stationary measure 
fj,{dx). Due to the linear dependence on 9 the minimization problem has a unique solution due to 
strict convexity of RER. Note that RE is also convex in this case and uniqueness of the corresponding 
optimization problem is guaranteed. The optimization problem reduces to solution of the linear 
system 


$6* = a, where ; a* = [((/>*,6)ntts] > = 


Moreover, this set-up is also used in estimation of non-parametric models, for example by specifying 
the basis set {(j)i} to be splines, or wavelets, (56j]. 


6. Effective Langevin dynamics and path-space force matching. 

We present the application of the path-space force matching described in the previous section 
to the stochastic Langevin dynamics for molecular systems. We propose Langevin-type coarse dy¬ 
namics with a parametrized force (and friction) and derive the optimal parameter set for which the 
path space relative entropy is minimized, both for finite time and stationary dynamics. Further¬ 
more, we present two specific coarse graining maps: (a) The transformation to the centers of mass 
of groups of atoms, and (b) the projection to a selected subset of atoms. 

The Langevin dynamics are described by the process for an Wparticle molecular 

system with positions q= (gi,..., qjq) € and momenta p = (pi,... ,pn) S which satisfies 

f dqt = M-^ptdt , 

ydpt = F{qt)dt - 'yM~^ptdt + adBf , 

a Hamiltonian system coupled with a thermostat, where F{q) is the force field that is not necessarily 
a gradient, see also discussion in Section [3Tl M = diag(mi/ 3 ,... , 771 ^^ 3 ) G ]^3Wx3Af jg mass 
matrix, 7 G ^3Nx3N jg friction and a G ^j^g diff^gion coefficients respectively, and 

Bt is the 3A^—dimensional Brownian motion. The diffusion and friction coefficients satisfy the 
fluctuation-dissipation relation aa^'' = 2 / 3 “^ 7 . 

Note that we have assumed that friction and diffusion coefficients are independent of the process, 
though our methodology Is also applicable for 7 = 7 (g) and a = (j{q), see Remark 16.51 The latter 
can be crucial if the microscopic and the CG molecular systems are not diffusive or when one wants 
to restrict the thermostat at the boundaries. 
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Let M(< N) be the degrees of freedom of the coarse space, i.e., the number of CG particles, 
and define the linear coarse-graining map Ilg : —>■ by 


N 


= X] 0 *^*’ j = for any q G 


t,3N 


(38) 


2=1 


for any set Cji ^ j = 1) • • •; M, i = 1,..., N, such that rank(ng) = 3M, Ilg denotes the matrix 
representing the transformation (I38p 

where I 3 is the 3-dimensional identity matrix. Furthermore, we denote 


n 






, Ux = {nqq,npp), X = {q,p ), 


where lip : denotes the momentum transformation. Let the CG particles have mass 

matrix 

M = diag (mils,..., ffiMh) € _ 

The momentum mapping lip is given by 


n„ 


MIIgM ^ 


(39) 


such that Hqdqt = HqM.~^ptdt = M ^UpPtdt. In the work the mass matrix M is dehned 
such that a consistency condition in the momentum space is satisfied which, with (1391) . dehnes the 
mass of the CG particles fhj,j = 1 ,... ,M. The consistency condition states that the momentum 
probability distribution of the coarse variables is the same on the coarse space and the microscopic 
space, that is ^ ^ f ^Pd(IIpp — p) dp. 


The proposed dynamics for the coarse variables x = (q,p) G 
system 

f dgi = M ^ptdt , 

[ dpt = F{qt] 6)dt - 7 M ^ptdt + ddBt , 


are given by the Langevin 


(40) 


where Bt is a 3M-dimensional Brownian motion. The diffusion coefficient cr is defined, according 
to relation (1331) . such that 

--tr _ tt 

fT (T — X M-'p (T (7 i \.p 

We examine two cases for the friction coefficient: 

(a) It matches the coarse graining transformation of the microscopic friction forces 

pUq = np7 . (41) 


(b) The fluctuation-dissipation relation is satisfied for the coarse space dynamics. Since the fluc¬ 
tuation dissipation relation is satisfied for the microscopic dynamics the friction coefficient in 
GG dynamics must be 

(42) 


1 


p = -/3dd^^ = np^nj. 
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Hence, the parametrization of the coarse dynamics is described here only through the force 


F{q;e), 0 e 0 , 

where 0 is the parameter space. The Langevin system ([37]) is written in the form of the SDE ()lip 
if we set n = GN, x = {q,p), 

b{x) = {M-^p,F{q) - , 

and 

a{x) = ao = {03N,aY^ . 

We also associate the coarse-grained dynamics with the SDE (|13p where m = 6 M, x = {q,p) and 

5(x; e) = (M" V, F{q] 6) - . 


Note that rank(cro) = 2>N and rank(II) = 6 M since rankpilg) = 3M. Moreover, we assume that 
given the CG map we can find a reconstructed process as described in Section 13.21 

Applying the results of Sections H.ll andlHl we provide the optimal parameter set 9* for which the 
CG Langevin process {qt,Pt)t>o best approximates Il{qt,pt)t>o = (J^gqt,^pPt)t>o, in the sense that 
the path space RE (or RER) is minimized, see Theorem l5.ll We state the result for the stationary 
regime and for the finite-time evolution. 

Stationary regime. Considering the description of the CG dynamics and assumptions on 7 , a 
and n mentioned above, conditions of Theorem l5.1l b) are satisfied thus we have that, for stationary 
microscopic Langevin dynamics 


9* 


argmin^E^ 


^\\m{x)-b{nx-,9)111^,^ 


where p{dq,dp) is the stationary distribution for {{qt,Pt)}t>0 and 


1^112 n#S = ^ g 


t)6M 


with 


and n* € 


x6M 


"( 2 ) 

^ = [cr (Jj O-Q , 


is a right inverse of II, that has the form II* = 


n» 0 
0 ul 


for any 11 ^ € 


x3M 


such that TlqTlq = / 3 M and lip = Mllg 
Therefore, 


\nb{x) - 5(nx; 9) 11^ n«s = W^pHq) - FiUqq; 9) - V - 7 M 




where 


and 


\u 




tSM 


= \a a\ a 
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If the friction coefficient is given as in the case (a), eq. m, the optimal parameter set is 


e* 


argmingE^j, 


UpFiq) - F{n,q-,d)f 

XJ-pi—i 


(43) 


This is exactly the force matching problem related to the Hamiltonian dynamics. On the other 
hand, if the friction coefficient is given by (b), eq. (j42jl . 


9* = argmingE^p 


-\\UpF{q) - F{n,q- 9) - 


[T-l 


f-llnte 


(44) 


Remark 6.1. The appearance of the term (Ilpy — np 7 lIp'Tlg)M“^p in the optimization is the 
result of the difference of the friction forces that contribute to the CG particle, since we consider a 
fixed diffusion term a, which is related to the chosen stochastic CG dynamics ()40p . 

Remark 6.2. Note that if F{q) = —VU{q), then 

Hp{dq,dp) = Z-^ exp{-PH{q,p)} , 

is the Gibbs distribution, where H{q,p) = + U{q). If F{q) ^ —'SIU{q), i.e., driven 

Langevin dynamics, the form of pp{dq,dp) is not analytically known, though in numerical imple¬ 
mentations it is not needed as sets of samples used to estimate the average are found as 

solutions of the Langevin system at the stationary regime. 


Finite time regime. For parametrization on a finite time interval [0, T], where the system has 
not reached stationarity the approach to find (time dependent) optimal parameters 9{T) is given 
by application of Theorem 15.H al. Using the same steps as for the stationary case, we have that 
the optimal parameter set for which the process {iqt,Pt)}t>0 best approximates {{qt,Pt)}t>0 at the 
time interval [0, T] is given by, if yllg = Ilpy, 


9*{T) 


argmingEpjg^^j 


1 

2 



UpFiqs) - F{ngqs;9)f .Js 

Llp^ 


(45) 


and if 7 = 

i ||npF(g) - F{Ugq,-, 9) - (Upj - Up^U^;Ug)M-^psf^,^^ds . 

(46) 

Remark 6.3. (Connection to equilibrium force matching) When the system is at equilib¬ 
rium the optimization problem is defined by (143 p and (I44p . where the expectation is taken with 
respect to the Gibbs measure fip{dq,dp). As the averaged quantity in (I43P does not depend on p 
the ijLp{dq,dp) expectation is equal to the expectation with respect to p{dq) = Z~^ exp{—/3U{q)}. 
Therefore the optimization problem is 


9*{T) = argmingEpjg,j,j 


9* = argmingE^ 


\npF{q) - F{npq-9)f 

Xir)^ 
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where lip = MIIM ^ 


and 


12 = 


with S = crcj*'’ the covariance matrix. Moreover, 


if the diffusion coefficient a is constant, identical for all particles, then S ^ = cr and if 


= I: 


3M: 


then 


9* = argmin^Ep 


^\\UpF{q) - F{Ugq;9)\(^ 


where || • || denotes the Euclidean norm in Therefore the path-space force matching is equiva¬ 
lent to the force matching for systems at equilibrium. Note that holds, for example, 

if Ug is the mapping to the group’s center of mass or an orthogonal map, see the examples in 
Section I 


Remark 6.4. We should also state here that in all above applications a typical diffusive process is 
assumed for the dynamics of the molecular systems. However, for realistic molecular systems this 
might not be a good approximation due to sub-diffusive behavior and the importance of long-time 
dynamical behavior or memory in a generalized Langevin equation (GLE) framework. In such a 
case, either a scaling of the CG dynamics in a post-processing stage, or an approximation of the 
memory terms appeared in GLE through a parameterization of the diffusion term in the Langevin 

This is the topic of a future work. 


memory terms appeared m yrUliv turougn a par; 
equation are required [^, |3,113, 25, 13, H, 


Remark 6.5. (Parametrized friction) Note that if we consider that 7 = 7 ( 5 ; ??) then the same 
methodology can be applied with h{x;6,'d) = (M p,F{q;9) — j{q;'9)'M . For this general¬ 

ization one should be careful to ensure the existence of CG processes with such friction and of its 
stationary measure. For example, consider whether fluctuation-dissipation condition holds. We 


also refer to related works [63|, l35|, l3l|, 


6.1. Numerical implementation and optimal parameters 


In Appendix B we study the RER minimization problem induced by discrete time numerical 


schemes of the Langevin dynamics and overdamped Langevin dynamics. We consider the Langevin 
(|371) and the coarse space Langevin (HOl) dynamics, apply the Brooks-Brunger-Karplus (BBK), [ 3 ], 
discretization scheme in both systems and study the RER defined for the discrete Markov chains, 
see (1571) . 

We also study the discrete time RER minimization problem for the overdamped Langevin 
dynamics {W}t>o in 


dXt = -h:{Xt)VU{Xt)dt + + a{Xt)dBt , 


(47) 


with a 0-parametrized coarse-grained approximation considered in the spirit of Section 13.21 pre¬ 
sented in detail in Appendix B, using the Euler-Maruyama scheme. Here cr(x) € k < n 


and E(x) = a{x)a^^{x) is non-singular and positive definite. Note also that the invariant measure 
is p,{dx) = exp{—t/(x)}(ix, where Z is the normalizing constant. This example demonstrates 

the applicability of our approach for stochastic dynamics with multiplicative noise. Moreover, it 
proves once more that the optimal parameter set derived from the discrete time analogue of (I47p 
does not depend on the discretization time step h as h ^ 0. 

The reason for studying the optimization through the discrete schemes is (i) the numerical 
parametrization and minimization are done always in a context of discretized dynamics, (ii) it 
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demonstrates clearly the passage from Markov chain approximation to continuous time process 
in the variational problem that defines the best-fit procedure. Indeed, as proved in Theorem \2l 


Appendix B when the discretization time step of the numerical scheme tends to zero the local 


minima of RER agree with the results for the continuous time case, Theorem 15.II bl. 


6 .2. Examples of coarse graining maps 

Center of mass. We consider the linear transformation Ilq, eq. (|38p . with 

Cii = mJ^ruiXCj {i) j ='i-, ■ ■ ■, M, i = 1,... , N , 

that maps N/M microscopic particles to its center of mass (the CG particle), where is the 
mass of the z-th particle, fhj = nii and xcj is the indicator function of the set Cj = {i : 

particle i contributes to CG particle j}. In order to simplify the demonstration we consider the 
special case where M = 2 and N is even and 

... 0 ... 0 

0 ... 0 hT,2 V(Ar/2)+l4 ••• fhf^mNh_ 

where fhi = 1^2 = Y1 ^={n/2)+i rank(ng) = 6 = 3M, and 



lip = MIIgM ^ 


I 3 ... I 3 0 ... 0 

0 ... 0 /a ... Is ’ 


with lip = MllqM ^ = lip'’. Thus the weighted norm || • appearing in the optimization 

problems (I43p - ()46p . for the case of identical and constant covariance coefficients a for which H = 
o-"^/ 3 Ar, is 

llullJi#^ = a-2u*'’(n«)‘'’njn = U = {ui,u2t e , 

as u = {ui, ... ,ui,U 2 , ■ ■ ■, U 2 Y'" . The minimization problem (f4^ becomes 


6 * = argminpEp 


N 

4(t2 




j=i ieCj 


for F{q]9) = {Fi{q] 9 ),F 2 {q; 9 )Y'^ G where || • || denotes the Euclidean norm in 


An orthogonal projection. A simple example of an orthogonal mapping is the projection on the 
first M coordinates, i.e., for g = (gi,..., qnY^: = (qi,..., qhdY^ 1 represented by the 3M x 2>N 

matrix 

Hq = [izM 0^Mx{3N-3M)] ) 


and 

lip = MllgM ^ = [J 3 M 03Mx{3N-3M)] = n, , 

with M = diag (mi/ 3 ,..., ttzmIs), for which holds lip = 11*'’. When the diffusion coefficient cr is a 
constant same for all particles the weighted norm appearing in the optimization problems (I43p - (l44p 
is 

||u||^ti„ = —for any u G , 
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and the minimization problem 


e* 


argmin^E^ 


1 


M 




Fi{ngq-,e)f 


7. Data driven coarse-graining and path-space variational inference 


In the previous sections we built optimal coarse-grained dynamics such as (I40p as approximations 
of microscopic stochastic dynamics in the same general class of stochastic differential equations, 
e.g., the microscopic Langevin dynamics (1371) . The path-space information approach proposed 
there, allowed us to systematically compare microscopic and coarse-grained dynamics using the 
path-space relative entropy approach in Theorem 15.11 and set up the corresponding parameter 
path-space variational inference problems. 

In this section we explore a different but asymptotically equivalent (in the number of data) 
perspective, which is data-driven in the sense that it treats any available microscopic simulator 
purely as a means of producing statistical data in the form of time series. 

The primary new elements of this coarse-graining approach are: (a) We derive the parametriza- 
tion of the coarse-grained models by optimizing their information content (in the path space) 
with respect to available time series data T) coming from a hne-scale simulation. We use com¬ 
putable formulas similar to those for the relative entropy rate (RER) discussed earlier; (b) In (a) 
we do not need that the microscopic scale time series data T> are obtained from a model P[o,t] ^ 
the same mathematical class as the coarse-grained models or that it is even Markovian. Thus we 
do not require a microscopic Langevin model T’[o,t] was done in Section [5] and Section [6] or any 
other explicitly known molecular dynamics, at least provided sufficient data is available; (c) Due to 
information inequalities such as ([5]) and ([7]), the present data-driven, path information-based coarse- 
graining methodology implies transferability of the parameterization to different observables, at a 
specific thermodynamic point, by only training a single observable, namely the relative entropy. 
Numerical tests for the verification of this observation is the subject of on goin g work. We also refer 
to some recent prior work that relates observables and relative entropy, [23, l73| . (d) The relative 
entropy rate (RER) approach in (1241) which is also reflected in our data-driven approach, see for 
instance eq. (|56p below, allows us to train models to be predictive at long-time regimes, and not 
just for any hxed finite time window [0,T]. 

One of the key points in this method is taking advantage of the ordering of the distributions 
in relative entropy, that allows us to write the path-space variational inference problem as an 
average of the available fine-scale time series data. In this sense the method relies specifically on 
the availability of “big data” from the microscopic solver. Eurthermore, it provides a systematic 
approach to compress them in the form of the coarse-grained model Qwi t ] with controlled loss of 
information measured by RER, in analogy to information criteria, [H, (71| . see formula (llOp . 


Eor simplicity in the presentation, we focus on discrete in time parametrized coarse dynamics. 


for instance numerical schemes for Langevin equations, such as the ones considered in Appendix B 


In analogy to the parametrized dynamics (|40l) . we consider the parametrized coarse-grained tran¬ 
sition probabilities p^(nX, HA'), i.e., the probability for the CG state IIA given that the system 
is at IIA, that correspond to a discretization scheme for ()40p . see also [Appendix B Then the 
corresponding coarse-grained path-distribution is, assuming Markovianity, 


= g''(nAo,.. .,uxt) = M(nAo)/(nAo,nAi).. ./(nAT-i,nAT), (48) 
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where [i denotes the initial distribution of the process and T) = {nXi,nX 2 , ..., 11 X 7 ’} is a typical 
coarse-grained time series corresponding to the microscopic time series T> = {Xi,X 2 , ...,Xt}- 

Finite-time regime: We hrst consider an ensemble of hne-scale data given in the form of M 
time series T)^ = {X^, X 2 , ■■■, X^}, k = obtained from a hne-scale molecular simula¬ 

tion algorithm, up to a prescribed time horizon T. The corresponding coarse space time series is 
= {nxf,nx|,...,nx|.}, /c = 1, We note here that in this case we do not necessarily 

need that this data set is obtained from a Langevin or any other Markovian algorithm. Then the 
typically unknown path-space distribution of this coarse-space time series is denoted by 


p = P{nXo,...,nxT). 

Note that P is the push forward of the microscopic measure P, see relation ([3]) Section 12.11 Fur¬ 
thermore P is computationally accessible from the microscopic simulation which samples P and 
projects the Xj’s onto IlXj’s. 

On the other hand we also consider the parametrized coarse-grained family, defined in (|48p . 
= Q^ITVXq, ... jllX-r). In order to obtain the optimal parametrized coarse-grained transition 
probabilities p^ (IIX, TEX'), for a parameter vector 9 = 6 *, we need to minimize the path-space 
relative entropy, i.e., consider the minimization problem 

0^ = argminP(.P|(5^). (49) 

0 

Furthermore, for the path-space relative entropy we have by the law of large numbers that 

■r(p\Q^) = lim nM{P\Q^) 

\ / M—^oo 

where we define the unbiased estimator for the relative entropy 


nM{P\Q^) ■■= 


p(nxf,nx|,...,nxj;) 
M ^ Q®(nx*=, nx|,..., nx|.) ’ 


Therefore, the minimization principle (I49p becomes 


(50) 


1 


M 


arg mjn TZm {P\Q^) = arg max — j;iogQ^(nx 

^ k=l 


f,nx|,... 


nx^) 


1 

M 


M 

J]iogP(nxf,nx|,..., 


k=l 


nx^). 


(51) 


which does not require a priori the knowledge of the microscopic probability distribution or its 
push-forward P{IIX^, nx|,..., HX.^). Therefore, we obtain from m and (HSil the following 
maximization principle 

T M M 

9^ ~ arg max iog/(nxf, nxi^i) + log /i(nxo^). (52) 

i=\ k=\ k=l 
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For a time window T ^ 1 the last term in (|52l) that involves the initial data becomes negligible, 
therefore we obtain a coarse-grained path-space likelihood maximization principle and the corre¬ 
sponding estimator 6 {M,T) of 9*: 

9{M, T) = argmaxL(0; {HXf , (53) 

u ’ 

where 

T M 

m {nxf iog/(nxf, nxf+i). (54) 

i=l k=l 

Note that if the transition probabilities in (j54p are replaced with a stationary measure and 
T corresponding to independent samples V = {Xi, X 2 ,Xt}, then (1531) becomes the classical 
Maximum Likelihood Principle (MLE), Q, In this sense (I52p is a maximum likelihood for the 
coarse-grained time series, V = {IIXi,nX 2 , ...,nXAr} of the fine-scale process, and thus includes 
dynamics information and temporal correlations. 

Stationary regime: If the time series associated with the data set k = 

are stationary, then they are statistically indistinguishable and we will eventually drop 
the index k, referring to the data set as P = {Xi, X 2 ,..., In this case the estimator in (I54p 

simplifies significantly. Indeed, for M ^ 1, we have that 

T M T 

L( 0 ;{nxf}f^f,^,) = J]iog/(nxf,nx,^i) « m j;Ep[/(nx,,nx,+i)], ( 55 ) 

i=l k=l i=l 

Ep denotes the expectation with respect to the path distribution .P(nXi,nX 2 , ..., 11 X 7 ’). Using 
the stationarity of the time series in (I55p we have that 

T 

^Ep[/(nx„nXi+i)] =rEp[/(nXi,nx2)]. 

i=l 

However, due to stationarity we have the unbiased estimator for Ep [p®(nXi, 11 X 2 )], 
using a single time series P = {Xi,X 2 , ...,Xt}'- 

1 ^ 

Ep[/(nxi,nx2)] « - J]/(nXi,nx,+i). 

i=\ 

Therefore the stationary analogue of (1^51) is 

9* « 9s{T) = argmaxL,(0; {nX,}f=i), (56) 

6 

where for the stationary time series P = {Xi,X 2 , ...,X 7 ^} we define 

T 

LM {nXijt,) = Y, iog/(nXi, nXi+i). (57) 

i=l 

Time dependent regime: The optimization principle in (1531) can be also easily extended so that we 
can obtain a further improved but computationally more costly time-dependent optimal parametriza- 
tion for (1491) where now we seek 

e* = 9*{i), where 0<f<r, 


421, 641, 
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solving the approximate optimization problem for 0 = 9{i), i = 0, 


T-l M 

arg min ' ' 

e 


T-l M 

j;^iog/«(nx^nx^ 

^ _n j _1 


■ti) 


i=0 k=l 


In [3^ the authors also obtained data-driven parametrization of multi-scale diffusions on a finite 
time window [0, T] based on minimizing ([1]) for particular observables (pi. On the other hand, in ^ 
a similar problem was considered for multi-scale diffusions, where coarse-grained parametrization 
was based on minimizing instead information metrics, e.g., 


Remark 7.1. Confidence intervals for the estimator 6 s{T) in ()57ji can be provided in terms of 
the asymptotic normality (in T 1) of the estimator. The corresponding (Gaussian) asymptotic 
variance is give n in terms of the inverse of the path-space Fisher information matrix (FIM). We 
refer to 42. IbSl. [73| for a discussion of such results, as well as the definition of the path FIM and 


its use for sensitivity analysis of stochastic dynamics that include Langevin dynamics and kinetic 
Monte Carlo algorithms. 


8. Conclusions 

In this work we presented a thorough examination of coarse-graining of non-equilibrium molec¬ 
ular systems using path-wise information metrics. We have introduced the minimization problem 
for optimizing coarse models based on relative entropy for comparing continuous time diffusion 
processes. The derived scheme is similar to the widely applied force-matching method used in 
computational coarse-graining which, however, is restricted to equilibrium processes. 

The main novelties of the proposed approach are summarized in the following points: (a) It is 
applicable to transient regimes of non-equilibrium processes, since it directly involves information 
along the whole path-space; (b) It connects the path-space relative entropy minimization with an 
(extended) force matching problem for continuous time dynamics, (c) It becomes entirely data- 
driven when the microscopic dynamics are replaced with corresponding correlated data in the form 
of time series. From a more general perspective the proposed scheme is directly related to dy¬ 
namical data fitting as well as to machine learning algorithms. Indeed, the path-space information 
approach allowed us to relate the RER minimization problem to corresponding parameter optimiza¬ 
tion problems, obtained from data-driven methodologies, in the sense that it treats the microscopic 
simulator as means of producing statistical data in the form of time-series; (d) The interpretation of 
the dynamics with continuous time process demonstrates that the RER for stationary dynamics is 
independent of the time step for any numerical discretization scheme. Most importantly the RER 
perspective shows that the corresponding optimization method is extendable to infinite times and 
non-reversible systems as is demonstrated in the current study for continuous time diffusion pro¬ 
cesses and in ^ for Markov chains; (e)The approach is generally applicable to stochastic dynamics 
such as Kinetic Monte Carlo algorithms and reaction networks. 

Current work concerns the numerical application of the proposed RER minimization method¬ 
ology to coarse-graining of molecular systems under equilibrium and non-equilibrium conditions. 
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Appendix A. Proofs 

Appendix A.l. Proof of Theorem \3.1\ 

We consider a coarse observable function f{x) = giYix), where / € (7^(1^”;]^) and g G 
and such that the conditions of Theorem [Q are guaranteed. denotes the 

space of twice differentiable functions / : M” —> M. 

From the martingale problem, 61, Section 8.3], [s^, given 6, S, defined in relations (1181) and 
dm), there exists unique process {Xt}t>o which is the solution of (fTH]) such that 

Xt = PlXt, in distribution. 


Denote 


^f{x) = ^ hi{x)^{x) + \^ L (^) 


2 = 1 




dxidxj 


with generator of Xt. For f{x) = (^(Ilx), using relations (fT8l) and (fT^ . see also [6l|, Lemma 7.3.2], 
we can write 

™ - An 1 "■ _ B'^n 


;.i - ij.i 

Let T > 0 be a stopping time, from Ito’s formula 


f{Xt) = f{x)+ f Cf{Xs)ds+ fvfiXsY^ 
Jo Jo 


adB. 


where Xq = x, by applying expectation on both sides. 


mfi^r)] = f{x)+E^ 


fcf{Xs)ds+ fxfiXsY^ 

Jo Jo 


adB. 


= 5(nx) -|- E' 


rix 




7=1 


1 ^ 7)2 


I,J=l 


Thus 


E-[f{Xr)]=E^^[g{X^)], 


for any coarse observable f{x). 


□ 
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Appendix A.2. Girsanov Theorem 

The Girsanov theorem states the conditions under which the path space measures P[o,t] 

Qfpi ™ are absolutely continuous and provides a closed form of the Radon-Nikodym density {Xt, t). 

1^5-^ J ^^[0,T] 

Suppose that there exists a process {n(Xs; 0)}s>o in such that 


a{X,)uiXs]9) = biXs) -b{X,-,e), 


and satisfies Novikov’s condition E 


'^y^\u(x,-,e)\^ds 


< oo, 


[^ . We dehne 


Mt := ^(Xo)exp|-^*(u(X,;0),dR,) j\u{X,-e)\^ ds^ , 

where {u{Xs]6),dBs) = YM=i'^i{Xs':9)dBl and \u{Xs;6)\‘^ = Yli=i'^ii^s'i9). Then the Girsanov 
theorem yields that Q ^ is absolutely continuous with respect to T’[o,t] j Q[o t] ^lo,T] ’ 


dP 


[0,T] 


dQ^ 


{Xt,t) = Mt. 


(A.l) 


[0,T] 


Furthermore, the process Bt ■= Jq u(Xs; 0) ds+Bt is a /c-dimensional Brownian motion with respect 
to -P[o,r] • 

Appendix A.3. Proof of Theorem \4.1\ 
a) The Novikov condition 


E 


< oo 


e^/o \u(Xs-fi)\'^ds 

ensures that TZ ^-P[o,r] I Q[ot^ < oo- Thus 

p (^[o,T] I Qfo,r]) “ {Plo,T] I Qfo,r]) -PiTol = 


= lEPr 


[0,T] 


d^fo.T] _ 


= IEr 


[O.T] 


= lEPr 


[0,T] 


[ {uiXs;e),dB,)- [ \uiXs-,e)\‘^ds 

Jo Jo 

r{uix,-e), 

Jo 


- 7^(^0 I i^o) 


dBg + u{Xs] 9) ds) 


= lEp, 


[0,T] 


- / {u{Xs-,9),dB,) 


= 0 , 


the last term is zero since dBg is a Brownian motion with respect to T’[o,t]- 
b) Since {W}t>o is stationary 


i^^(^[o,T]IQfo,r])=TE^ 


1 
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thus, based on the representation (l26l) of the relative entropy, 


lim 

T^OO 



lim 

T— )-CX) 




E. 




Recalling the definition of the RER, 

niP I Q^) = (P[o,r] I Qfo,T]) 


we conclude that P{P\Q^) = E^ [i|u(X;0)p]. 

Appendix A.4- Proof of Corollary \4.1\ 

By assumption there exists u{x\9) such that 

a{x)u{x] 9) = b{x) — b{x] 9), for all x € M"'. 

Since rank((T(3:)) = r = k, i.e., a{x) has full rank, then a^^{x)a(x) € is invertible, thus a 

solution u{x; 9) G is given by 

u(x; 9) = {x)a{x)] ^ a*^{x) (^b{x) — b{x] 9)^ = E(x) ^b(x) — b{x] 9)^ , 

where E(x) = {x)a{x)\ ^ a*^{x) and 

|u(x;0)|^ = 0)u(x; 0) 

= (b{x) - b{x; 9)'j E^^(x)E(x) (^b(x) - b(x; 9)'j = \\b{x) - b{x; 9)\\l , 

From Theorem 14.1 1 bl the RER is 


'R(P|Q^)=E^ -\uiX-,9)\‘^ 
thus substituting of the previously derived form of u{x] 9) we prove (l30|) . 


^(P|Q®) =E, 


-\\b{x)-b{x-9)\\l 


where the norm || • ||h is defined in (I31h . 


Appendix A. 5. Proof of Theorem 15.11 
a) From Corollary 14. ll|b|i . we have that 


where 


^ (■f’[o,r] I Qfo,T]) — iP[o,T] I Qfo,T]) r'o) , 


'^^{P[0,T] I Q[0,T]) = ^Plo,T] 


^ £ \\biX,) -b{Xs;9)\\lds 
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As fJ- 0 , 1 ' 0 , are independent of 0, we only need to prove that 


V^Ep, 


[0,T] 


^£\\b{X,)-b{Xs;6)\\lds 


= VgEp, 


[O.T] 


\\nb(Xs) -b{uXs;e)\\l,t^ds 


Recall the representation 

6 (x; 9) = n# 6 (nx; 9) + {I - U^U)y^{x) , and n(/ - U^U)y^{x) = 0 , 

for all X G M”. We also write b{x) = U^Ub{x ) + {I — Il^n) 6 (x) and have 

\\b{x)-b{x-,9)\\l = ||n»n(^6(x)-6(x;0)) III+ 11(7-n»n) (^ 6 (x)- 2 /^(a: 

= ||n ( 6 (x) - b{x- 0)) ||^„. + 11(7 - n«n) (bix) - y^ix 

for all X € M"", where the cross terms are zero as 11(7 — n^n)y*“(x) = 0. Thns from the 
assumption that y’"(x) is independent of 9 we have 


VqTI (-P[o,t] I Qfo,T]) = {P[o,T] I Qfo,T]) 


= XgEp, 


[O.T] 


\\ub{x,)-b{nXs-,9)\\l,,^ds 


and 


argmin^geTe ( P[o,r] | Q[o,t] ) = argminegeEp^^^^j 


/o 


||n6(A,)-6(nX,;0)||^„^7s 


b) We recall that for stationary process {Xt}t>o, see Corollary 14.11 


7^(P|Q'^) = -7^^ (^[o,T]iqo,r]) 


and from a) we have 


Ve^^(-P[o,T] I Q[o,t]) = '^eEpj^ yj 


thus 


V,7^(P|Q^) = ^V.Epj^^^, 




1 


||n6(x,)-6(nA,;0)||L^ds 


r||n6(X,)-6(nA,;0)||^„^7s 

Jo 


||n6(A)-6(nx;0)||2 


— VgEp 


-||n6(A)-6(nA;0)||^„^ 


where we used that {Xt}t>o is stationary with the invariant measure y{dx). Thus 


argmin0g0H(P | Q^) = argmin^ggE^ 


-||n6(X)-5(nA;0)||^„^ 


□ 
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Appendix B. Relative entropy rate minimization for numerical schemes 


The relative entropy rate T-i{P \ Q^), for discrete time Markov chains {xi}i,{xi}i with transition 
probabilities p^{x,x') and qg{x,x') respectively is, 42l |. 


n{p\Q^) = - 


f \ ht /M P^{x,x') , 

p{x)p [x, X ) log —r- - -dx dx . 

g^{x,x') 


(B.l) 


The RER is described here for the specific (stationary) Markov chains generated by the numerical 
schemes for the Langevin (1371) and the overdamped Langevin (|47p thus formula (IB.ip is a statistical 
estimator of the RER, see |42l ]. 

In terms of the description in Section [71 where the discrete time version of the RE is introduced 
on the coarse space for Q®, eq. (|50p . here we use the same description though on the microscopic 
space where is a reconstruction of Q^. 


Appendix B.l. Euler discretization for over-damped Langevin 

The Euler discretization scheme for ()47[) with time step h defines the discrete stochastic system 

Xi+i = Xi- ^Y.{xi)VU{xi)h + + cr(xi)A147 , 

with the solution given by the Markov chain {xj}j>o on the state space M", AVE, ~ -^(0, hin) are 
normally distributed increments. The transition probability density for the chain {xi}i is 

/(x,x') = ^^exp|-^(a:'-Ax)*’'S-i(a:)(x'-Ax)| , 

where we denote 

Ax = x- is(x)VC/(x)/i + , 

and Z^(x) = (27r/i)’^/^|S(x)|^/^, with the notation |A| for the determinant of the matrix A. 

Consider the linear CG map 11 : M” ^ M™, which for simplicity we assume it is an orthogonal 
projection from the state space M” onto such that 

X = IIx + (7 — n)x = X + X . (B.2) 

Note that we use the same letter Ilx € M”* for denoting the representation of 11 in R"'. 

The reduced model based on the CG mapping 11 is given by {xj}, approximating the projected 
Markov chain {llxi}, satisfying 

Xj+i = Xi - ^T,{xi;9)VU (x*; 0)h + ^VS(xi; 9)h + AWi , 

where AWi ~ A(0,/iS(xi; 0)). Hence the transition probability density for the CG chain {xj}* is 

Pe{x,x') = -p^^^exp|-^(x'-A0x)*^S~^(x;(9)(x'-A0x)| , 

where 

Aex = Xi - ^S(xi; 9)VU{xp, 9)h + ivS(xi; 9)h , 
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and Z^{x; 6 ) = ( 27 r/i)™/^|S(xj; We consider the reconstructed chain {xi}i>o of {xj}j>o that 

has the transition probability density 

qg{x,x') = iy{x'\x')pe{x,x'), 


where z/ is a probability measure associated with the reconstruction, that we assume independent 
of the parameter 0 . Note that we can write the covariance matrix S(a;) as 


S(x) 


Sii(x) i:i2(x) 

T.2i{x) T.22{x) 


with Sii(x) = ns(x)n*'’ e S22 (x) = e n-^ = (/-n) and 

^21 (®) = ^12(3^). Then we can rewrite p^{x,x') as 

p^{x, x') = pi{x, x')p2{x, x'\x'), (B. 3 ) 


where 

Piix,x') = - nAx)*''S5“/(a:)(x' - nAx)| , 

Z’lix) = (27r/i)"*/2|Sii(x)|i/2 and 

P 2 {x,x'\x') = - g{x,x')Y^E-^{x){x' - g{x,x'))^ , 

where 

E{x) = E22 {x) - E%{x)Ell{x)E^2{x) G m(—™) x(—) , 
g{x, x') = n-*-Ax + Y^[2{x)Ti'^^{ x) (x' — IlAx) , 

S 22 (a:) = n^S(x)n^’^’' € ^ ^ ( 27 r/l)(”-'")/ 2 |S( 3 .)|l/ 2 _ 

The variational problem for the best-fit of parameters in terms of the relative entropy rate 
between the Markov chains {xj}j>o and {xj}j>o is demostrated by the following theorem revealing 
its relation to a weighted force-matching optimization. 


Theorem 1. Given h > 0 


a) argmin0ge?^(P|Q®) = argmin^gg [^A{ 0 ) + B{ 0 )], 

where 


AW = i 
BW = i 


[- log |ns(x)n*’'s-i(x; 0)1 + Tr (nS(x)n*"'S-i(x; 0 ))] p{x)dx 


(n6(x) - b{x- 0))*’' 0) (m(a:) - 6(x; 0)) 


p{x)dx 


with 


b{x) = -^S(x)VC/(x) + ^VS(x), 

b{x] 0) = -^S(x; 0)VG(x; 0) + ivS(x; 0). 
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b) In the limit /i —>■ 0 we have 


lim min —A(9) + B{9) 
h^o 6 »ee h 


= min \E{9) + ml2] 

0G0 


(B.4) 


where E{9) = -B(^)|g=ns(x)n*'-) is 

E{9) = ^j nix) (n6(x) - 6(x; 9)^ (nS(a:)n‘'')-^ (n6(x) - 6(x; 9)) dx 


where 


b{x;9) = -ins(x)n*'’vi/(x;0) + ivns(x)n*^ 


Proof. a) Recall the definition of the relative entropy rate ^(P|Q®), |42l |. 

niPlQ^) = t[[ At(x)p^(x, x') log dx'dx . 

^ J J QqK^I X ) 

From the definition of , qg and the factorization of given in (IB.3(1 , we get that 

p^ix,x') pi{x,x')p2{x,x'\x') 

-R = log- /- -n , /i-n = 

qgix,x') peix,x')vix'\x') 

^ ^ ^ " nAx)*"Sf/(x)(x' - nAx)] 

+ log ^ [(^' - A 0 x)*^S-^(x; 9){x' - A^x)] . 

Note that the hrst two terms are 9 independent, therefore they will not contribute to the minimiza¬ 
tion problem Ve'HiP\P^) = 0. Thus 


hVg'HiP\Q^) = Ve 


X log 


nix)piix,x')p2ix,x'\x') X 

^ J_ ijjji;' — A 0 x)*'’S“^(x; 9)ix' — A^x)] | dx'dx'dx 


Zf(x) ' 2h 
= Vgl + Veil 


I = 


n{x)pi{x, x') log ^ dx'dx = ]- f //(x)log|S(x; 6 »)(nS(x)n*^) ^\dx. 
Z^[x) ^ J 


II = j J d-{x)piix,x') [{x' — Agx)'^''E ^ix]9)ix' — Agx)~\ dx'dx 
= ;u(x) [(nAx — A 0 x)*^S“^(x; 0)(nAx — A^x)] dx-I-III 
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where the first term on the right hand side in the previous equality is zero and 


III = — [ (nix) _ - _ 

2hJ J 

X [(x' — nAx)*'’S“^(x; 9){x' — IIAx)] dx'dx . 

Since S is positive dehnite and symmetric we can write 9) = D^''{x] 9)D{x] 9), and perform 

the change of variables s = D[x' — IIAx) in the integral appearing in III. Then 


III 


2hJ J ^ Z’^{x)\D{x;9)\ 

i j ix(x)Tr (S-^(x;0)nS(x)n*^)dx. 


We use Tr(-) to denote the trace of a matrix. Concluding, a local minimizer 9* of 'H{P\P^) is given 
as the solution of \/eP{P\Q^) = 0 where 


hVen{P\Q^) = Ve 


i j /i(x)log|S(x;0)(nS(x)n*’')-V® + 

j //(x)Tr (nS(x)n*’'S-^(x;0))dx + 



fi{x) (IIAx — AqxY^ S ^(x; 9) (IIAx — A^x) dx 


= Ve 


Ia{9) + B{9) 


b) We have that lim min + 5(0)1 = min \B(9)] where ^ = min 0 ^( 0 ). This 

h^oeee ^ ^ {eee-.A{e)=f} 

relation can be easily proved using optimality arguments as /i —)■ 0. Note that ^(0) = y when 

s(x;0) = ^s(x)^‘^ 

The proof is thus completed if we substitute the above relation in the form of 5(0). □ 

Appendix B.2. The BBK scheme for Langevin dynamics 

The explicit Euler-Maruyama-Verlet followed by implicit Euler-Maruyama scheme, also known 
as BBK scheme, is applied for the discretization of the Langevin dynamics (|37l) 

'Pi+1/2 =Pi- F{Qi)^ - + aAWi , 

< qi+i = qi +M"Vi+i/2^ , 

^Pi+i = Pi+1/2 - F{<ii+i)^ - + <yAWi+ii2 

with AITj, AIEj_|_i /2 a sequence of i.i.d. Gaussian random vectors with mean zero and covariance 
^In^ where for notation simplicity we set n = 2>N. Eor simplicity we set 7 = 7 /^ be constant and 
similarly for a = a In- Also we set the same mass M = 1 for all particles. The discretized Langevin 
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process {qi,Pi) is a Markov chain, with transition probability from the state {q,p) to {q',p') given 
by 


P^{q,p,q',p') = P^{q'\q,p)P^{p'W,q,p), 

where 

P^{q'\q,p) = ^exp|--^|g'-A(f(g,p)| 2 | and 

P^{p'\q,q,p) = ’ 


where we introduce the notation 


^Uq^p) = q-h 


\h h'y 

p- Piq)^+PY 




and the normalizing constants Zq = and ^ reduced discrete 

model based on the CG mapping 11 given by the projected process {Y\.qi,Ylpi) € M” x M” is 
approximated by the Markov chain {qi,Pi) € M™' x M™ satisfying 


'pi+i/2 =Pi- P{qu 0)^ - iPi^ + , 

< %+i = qi +Pi+i/2h, 

,K+i = Pi+i/2 - P{qi+i-,^)^ - IPi+i^ + o-Af^i+i/2 


where AW*, Afyj+ 1/2 ~ -^( 0 , |/m)- The Markov chain {qi,Pi) has transition probabilities 

Peiq,P,q',p') = Pe\q'\q,P)Pe\p'\q',q,P), 

where 


1 


1 


( 9 'k,p) = ^exp|-- 3 ^|g'-Ao(g,p)| '> and 


Pe{p'W,q,p) = ^exp<j-^b'(i + ^) - ^i{q,^)\ 


Z^ 


7 /i^ 


a^h 


with Aq((7,p) = q — h p — F{q] 0)| + p^ and A]*(g, q') = ^(q' — q) — ^F{q'] 0 ), and normalizing 

constants Zq = (vr/i^cr^)™''^^ , ^ ^ ^ • We denote with i^{q',p'\q',p',q,p) a probability 

measure associated with the reconstruction map and assume that is independent of the parameters 
6. Then the reconstructed chain {qi,Pi) from {qi,Pi) has transition probability 


Qe{q,p,q',p') = P8iq^p^q'^p')^iq'iP'W^p'iq^p) 


Theorem 2. Given h > 0 the variational problem for the best-fit of parameters is 

argmin 0 ^(P''|Q^) = argmin^ [C{e) + DhiO)] , (B.5) 
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where 


c{e) = 

DhiO) = 


-1 


O' 


{UF{q)-Fiq-9))f 


Furthermore, as /i ^ 0 we have the optimality condition 




lim Ven{P^\Q^) = -V,E^ 
/i—>-0 4 


O' 


-1 


(nF(<?)-F(ng; 0 ))|' 


(B.6) 


Proof. Based on the orthogonality of the coarse graining mapping 11 we can factorize P^{q'\q,p) 
and P^{p'\q',q,p) as 

P’^{q'\q,p) = 

^0 


and 


P^ip'W,q,p) 




where x denotes n-*-x = (/ — n)x, for 11 a projection, nAQ(q,p) = q — h p — IlF{q)^ + p^ and 

nA^(g,(?') = i(r -q)- |nF(g') , Ag(g,p) and n-’-A^(g, g') are similarly dehned. From the 
definition of the RER functional P{P^\Pg) and the fact that ip{q',p'\q',p',q,p) is independent of 
the parameters 9, we have 


hVenPVe] 


= Vfi 


{j ■■■ f P^{q,P,Q',p')^og- 
= Vel + Veil 


^■^^^)dq'dp'p{dq, dp)} 


where 


I = 


II = 


1 Zf 

0 - 2/1 


1 Z'l 
a‘^h^ Z^ 


e 




h 


p'{l + j-)-A’l{q,q') 


dq'dp'dq'p{dq, dp) 


6 


|p'(l+^)-nAj(g,q')Pp/xi„/ 




q' - ^0 {q, P) dq'dp'dq'p{dq, dp ), 


and 


Zj* = 


TTha^ \ 


(1+7W 
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After integrating with respect to dp' and dq' we obtain 


II = 


1 1 
0 - 2 /l 3 J 

1 1 
a'^h^ j 

2 1 


+ 


a- 


+ 


1 1 


Cj2/l3 Zg 


e-^k-'-nA5(q,p)|2 


" k^—nAg (g,p)p 


9 '-Ao(g,p) dq'p{dq,dp) 


nAo(g,p) - Ao(g,p) dqp{dq,dp) 


re-^l9'-nAj(g,p)lY^' - A 


- ^0 P) nAg(g, p) - Ag(g, p) dq'p{dq, dp) 


e-^k-'-nA5(q,p)|^ 


^ - nAg (g, p) dq'p{dq, dp). 


Therefore 


Veil = Ve 
= Ve 
= Ve 


a- 


1 I 


e cy'^h'i 


|9'-nAj(g,p)|2 


nAg(g,p) - A^(g,p) d^p{dq,dp) 


I 


cr^/i^ 

h 
4 


cr 


nAg (g, p) - Ag (g, p) p{dq, dp) 
-\llF{q)-F{q-e))\%{dq,dp)\ . 


Similarly, 

Vel = Ve\h^ jjj e-^\^'-^'^°^^’^'^\"\a-\nF{q')-F{q'-d))\^dq'p{dq,dp)'^ . 

Summarizing all steps we get 


ye'H{P^\Q^e) = Ve <> 






+ Ve <j -E^ 


|CT-^(nF(g') -T(g';0))|^dg' 


\a-\nF{q) - F{q-e))f 


Furthermore, as h ^ 0^ we have that ie ol'S-p)! weakly, where <5 denotes 

the Dirac distribution. Thus 


hm Ve?^(P'*|Q^) = Ej|a-i(nF(g)-F(g;0))| 


h-i-O 


+ 7^/^ 


-1 


(7 


(nF(g)-F(g-;0))f 


= -E„ 


<7 


-1 


(nF(g)-F(ng;0))|' 


which concludes the proof. 


□ 
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